Ordering Depth First Search to Improve AFD Mining

نویسندگان

  • Jeremy T. Engle
  • Edward L. Robertson
  • Dimitar G. Nikolov
چکیده

This paper describes a new search algorithm, bottom-up attribute keyness depth-first search (BU-AKD), for mining powerset lattices with the use of a monotonic approximation measure; characteristics present in many problem domains. The research reported here focuses on one of these problem domains, the discovery of Approximate Functional Dependencies (AFDs). AFDs are measured versions of functional dependencies, which have received attention from the relational database and machine learning communities. Bottomup depth-first search, BU-DFS, algorithms in general can improve efficiency over the traditional bottom-up breadth first search algorithm by doing a better job of avoiding the calculation of the approximation measure based on information learned as the search space is explored. The goal of BU-AKD is to resolve one important drawback of BU-DFS algorithms which makes their use in practice problematic their inconsistent runtime performance as search parameters vary. The approach that BU-AKD takes is to use a heuristic to guide the exploration of a lattice which adapts to the search parameters, thus, providing consistent performance comparable to best-performing BU-DFS algorithms. This paper reports a variety of experiments which evaluate BUAKD and other algorithms using an algorithmic, machineindependent cost measure, as well as traditional runtime tests. Experimental results show that BU-AKD performs consistently well and validates a number of insights used in its design.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets

Maximal frequent itemsets (MFI) are crucial to many tasks in data mining. Since the MaxMiner algorithm first introduced enumeration trees for mining MFI in 1998, there have been several methods proposed to use depth first search to improve performance. To further improve the performance of mining MFI, we proposed a technique to gather and pass tail (of a node) information to determine the next ...

متن کامل

Efficient Maximal Frequent Itemset Mining by Pattern - Aware Dynamic Scheduling

While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent itemsets efficiently is important in both theory and applications of frequent itemset mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depthfirst manner. In this thesis, we develop...

متن کامل

Modularizing Data Mining: a Case Study Framework

This paper presents the fundamental concepts underpinning MoLS, a framework for exploring and applying many variations of algorithms for one datamining problem: mining a database relation for Approximate Functional Dependencies (AFDs). An engineering approach to AFD mining suggests a framework which can be customized with plug-ins, yielding targetability and improved performance. This paper org...

متن کامل

Eager st-Ordering

Given a biconnected graph G = (V,E) with edge {s, t} ∈ E, an st-ordering is an ordering v1, . . . , vn of V such that s = v1, t = vn, and every other vertex has both a higher-numbered and a lower-numbered neighbor. Previous linear-time st-ordering algorithms are based on a preprocessing step in which depth-first search is used to compute lowpoints. The actual ordering is determined only in a se...

متن کامل

Sequential Mining: Patterns and Algorithms Analysis

This paper presents and analysis the common existing sequential pattern mining algorithms. It presents a classifying study of sequential pattern-mining algorithms into five extensive classes. First, on the basis of Apriori-based algorithm, second on Breadth First Search-based strategy, third on Depth First Search strategy, fourth on sequential closed-pattern algorithm and five on the basis of i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010